Tolstoy Digital: Mining Biographical Data in Literary Heritage Editions
نویسندگان
چکیده
This paper presents a solution for mining the biographical information from commentaries on Leo Tolstoy’s letters. It is implemented as a part of Tolstoy Digital Project – a semantically marked-up web publication of the 90-volume complete collection of Leo Tolstoy’s works. Extraction of relevant biographical information will be used to create an open database for all the persons who were somehow connected with Tolstoy or Tolstoy’s works. The paper also accounts for various subtleties of the commentary apparatus and pays special attention to specific difficulties of biographical information extraction, such as the problem of defining the boundaries of expressions denoting profession, or the problem of non-standardized syntactic constructions for kinship relations.
منابع مشابه
Toward Algorithmic Discovery of Biographical Information in Local Gazetteers of Ancient China
Difangzhi (地方志) is a large collection of local gazetteers complied by local governments of China, and the documents provide invaluable information about the host locality. This paper reports the current status of using natural language processing and text mining methods to identify biographical information of government officers so that we can add the information into the China Biographical Dat...
متن کاملWords in Contexts: Digital Editions of Literary Journals in the "AAC - Austrian Academy Corpus"
In this paper two highly innovative digital editions will be presented. For the creation and the implementation of these editions the latest developments within corpus research have been taken into account. The digital editions of the historical literary journals "Die Fackel" (published by Karl Kraus in Vienna from 1899 to 1936) and "Der Brenner" (published by Ludwig Ficker in Innsbruck from 19...
متن کاملUsing Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works
This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarism and copy detection in academic works is successfully applied to perform comparative analysis of di erent editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have p...
متن کاملDigitisation of Literary Heritage Using Open Standards
The paper presents the methodology, technology and results of a collaborative Slovenian project aimed at e-publishing text-critical editions of literary heritage. The materials exhibit great complexity, as they are made available not only in facsimile but also in several interconnected transcriptions, and can include notes, glossaries, dictionaries, links to external resources, multimedia prese...
متن کاملA.P. Chekhov’s Works in Interpretation by F.E. Paktovsky
This paper presents a conceptual analysis of the essay by Paktovsky (1901) which concentrates on the works by Chekhov. The urgency of the research is determined by the significance of the literary figure for the history of Russian criticism of the 19th – 20th centuries, the importance of his vision concerning the writing of the authors of Russian literature of the turn of the century, as well a...
متن کامل